Goto

Collaborating Authors

 object reconstruction


SDF-SRN: Learning Signed Distance 3D Object Reconstruction from Static Images

Neural Information Processing Systems

Dense 3D object reconstruction from a single image has recently witnessed remarkable advances, but supervising neural networks with ground-truth 3D shapes is impractical due to the laborious process of creating paired image-shape datasets. Recent efforts have turned to learning 3D reconstruction without 3D supervision from RGB images with annotated 2D silhouettes, dramatically reducing the cost and effort of annotation. These techniques, however, remain impractical as they still require multi-view annotations of the same object instance during training. As a result, most experimental efforts to date have been limited to synthetic datasets. In this paper, we address this issue and propose SDF-SRN, an approach that requires only a single view of objects at training time, offering greater utility for real-world scenarios. SDF-SRN learns implicit 3D shape representations to handle arbitrary shape topologies that may exist in the datasets. To this end, we derive a novel differentiable rendering formulation for learning signed distance functions (SDF) from 2D silhouettes. Our method outperforms the state of the art under challenging single-view supervision settings on both synthetic and real-world datasets.


Perspective Transformer Nets: Learning Single-View 3D Object Reconstruction without 3D Supervision

Neural Information Processing Systems

Understanding the 3D world is a fundamental problem in computer vision. However, learning a good representation of 3D objects is still an open problem due to the high dimensionality of the data and many factors of variation involved. In this work, we investigate the task of single-view 3D object reconstruction from a learning agent's perspective. We formulate the learning process as an interaction between 3D and 2D representations and propose an encoder-decoder network with a novel projection loss defined by the projective transformation. More importantly, the projection loss enables the unsupervised learning using 2D observation without explicit 3D supervision. We demonstrate the ability of the model in generating 3D volume from a single 2D image with three sets of experiments: (1) learning from single-class objects; (2) learning from multi-class objects and (3) testing on novel object classes. Results show superior performance and better generalization ability for 3D object reconstruction when the projection loss is involved.


Review for NeurIPS paper: SDF-SRN: Learning Signed Distance 3D Object Reconstruction from Static Images

Neural Information Processing Systems

Weaknesses: - The dependence on the camera pose somewhat limits the applicability. It also raises the question how much the proposed method relies on high quality camera poses. Similarly, it relies on high quality silhouette images and it is not quite clear how robust the method would be in practical setups with silhouettes coming from e.g. an instance segmentation algorithm. An experiment with different levels of noise for the camera parameters and less than perfect silhouettes would be helpful to asses the robustness in real-world settings. Other work directly optimized the MLPs or conditioned predictions from the MLPs via latent codes. I would suspect those approaches to be more robust than a network predicting network weights.


Reviews: Perspective Transformer Nets: Learning Single-View 3D Object Reconstruction without 3D Supervision

Neural Information Processing Systems

This paper attempts to reconstruct a 3D volume for an object from a single image at test time. During training time it uses a number of views of the object to reconstruct a 3D volume containing the object where the volume is broken down into smaller voxels and the network predicts whether each voxel is occupied or not. The input is an image of the object only against a white background. They chose to ignore color and texture in their reconstruction work. The network they suggest is an encoder-decoder network where one half encodes an images into a 3D invariant latent representation and the decoder does dense reconstruction of only that object.


SDF-SRN: Learning Signed Distance 3D Object Reconstruction from Static Images

Neural Information Processing Systems

Dense 3D object reconstruction from a single image has recently witnessed remarkable advances, but supervising neural networks with ground-truth 3D shapes is impractical due to the laborious process of creating paired image-shape datasets. Recent efforts have turned to learning 3D reconstruction without 3D supervision from RGB images with annotated 2D silhouettes, dramatically reducing the cost and effort of annotation. These techniques, however, remain impractical as they still require multi-view annotations of the same object instance during training. As a result, most experimental efforts to date have been limited to synthetic datasets. In this paper, we address this issue and propose SDF-SRN, an approach that requires only a single view of objects at training time, offering greater utility for real-world scenarios.


OReX: Object Reconstruction from Planar Cross-sections Using Neural Fields

Sawdayee, Haim, Vaxman, Amir, Bermano, Amit H.

arXiv.org Artificial Intelligence

Reconstructing 3D shapes from planar cross-sections is a challenge inspired by downstream applications like medical imaging and geographic informatics. The input is an in/out indicator function fully defined on a sparse collection of planes in space, and the output is an interpolation of the indicator function to the entire volume. Previous works addressing this sparse and ill-posed problem either produce low quality results, or rely on additional priors such as target topology, appearance information, or input normal directions. In this paper, we present OReX, a method for 3D shape reconstruction from slices alone, featuring a Neural Field as the interpolation prior. A modest neural network is trained on the input planes to return an inside/outside estimate for a given 3D coordinate, yielding a powerful prior that induces smoothness and self-similarities. The main challenge for this approach is high-frequency details, as the neural prior is overly smoothing. To alleviate this, we offer an iterative estimation architecture and a hierarchical input sampling scheme that encourage coarse-to-fine training, allowing the training process to focus on high frequencies at later stages. In addition, we identify and analyze a ripple-like effect stemming from the mesh extraction step. We mitigate it by regularizing the spatial gradients of the indicator function around input in/out boundaries during network training, tackling the problem at the root. Through extensive qualitative and quantitative experimentation, we demonstrate our method is robust, accurate, and scales well with the size of the input. We report state-of-the-art results compared to previous approaches and recent potential solutions, and demonstrate the benefit of our individual contributions through analysis and ablation studies.


Efficient 3D Object Reconstruction using Visual Transformers

Agarwal, Rohan, Zhou, Wei, Wu, Xiaofeng, Li, Yuhan

arXiv.org Artificial Intelligence

Reconstructing a 3D object from a 2D image is a well-researched vision problem, with many kinds of deep learning techniques having been tried. Most commonly, 3D convolutional approaches are used, though previous work has shown state-of-the-art methods using 2D convolutions that are also significantly more efficient to train. With the recent rise of transformers for vision tasks, often outperforming convolutional methods, along with some earlier attempts to use transformers for 3D object reconstruction, we set out to use visual transformers in place of convolutions in existing efficient, high-performing techniques for 3D object reconstruction in order to achieve superior results on the task. Using a transformer-based encoder and decoder to predict 3D structure from 2D images, we achieve accuracy similar or superior to the baseline approach. This study serves as evidence for the potential of visual transformers in the task of 3D object reconstruction.


Perspective Transformer Nets: Learning Single-View 3D Object Reconstruction without 3D Supervision

Yan, Xinchen, Yang, Jimei, Yumer, Ersin, Guo, Yijie, Lee, Honglak

Neural Information Processing Systems

Understanding the 3D world is a fundamental problem in computer vision. However, learning a good representation of 3D objects is still an open problem due to the high dimensionality of the data and many factors of variation involved. In this work, we investigate the task of single-view 3D object reconstruction from a learning agent's perspective. We formulate the learning process as an interaction between 3D and 2D representations and propose an encoder-decoder network with a novel projection loss defined by the projective transformation. More importantly, the projection loss enables the unsupervised learning using 2D observation without explicit 3D supervision.


Dense 3D Object Reconstruction from a Single Depth View

Yang, Bo, Rosa, Stefano, Markham, Andrew, Trigoni, Niki, Wen, Hongkai

arXiv.org Artificial Intelligence

For example, given a view of a chair with two rear legs occluded by front legs, humans are easily able to guess the most likely shape behind the visible parts. Recent advances in deep neural networks and data driven approaches show promising results in dealing with such a task. In this paper, we aim to acquire the complete and highresolution 3D shape of an object given a single depth view. By leveraging the high performance of 3D convolutional neural nets and large open datasets of 3D models, our approach learns a smooth function that maps a 2.5D view to a complete and dense 3D shape. In particular, we train an endto-end model which estimates full volumetric occupancy from a single 2.5D depth view of an object. As a result, the learnt 3D structure tends to be coarse and inaccurate. In order to generate higher resolution 3D objects with efficient computation, Octree representation has been recently introduced in [13] [14] [15]. However, increasing the density of output 3D shapes would also inevitably pose a great challenge to learn the geometric details for high resolution 3D structures, which has yet to be explored. Recently, deep generative models achieve impressive success in modeling complex high-dimensional data distributions, among which Generative Adversarial Networks (GANs) [16] and Variational Autoencoders (VAEs) [17] emerge as two powerful frameworks for generative learning, including image and text generation [18] [19], and latent space learning [20] [21]. In the past few years, a number of works [22] [23] [24] [25] applied such generative models to learn latent space to represent 3D object shapes, in order to solve tasks such as new image generation, object classification, recognition and shape retrieval. Abstract--In this paper, we propose a novel approach, 3D-RecGAN, which reconstructs the complete 3D structure of a given object from a single arbitrary depth view using generative adversarial networks.


High Quality 3D Object Reconstruction from a Single Color Image

@machinelearnbot

Digitally reconstructing 3D geometry from images is a core problem in computer vision. There are various applications, such as movie productions, content generation for video games, virtual and augmented reality, 3D printing and many more. The task discussed in this blog post is reconstructing high quality 3D geometry from a single color image of an object as shown in the figure below. Humans have the ability to effortlessly reason about the shapes of objects and scenes even if we only see a single image. Note that the binocular arrangement of our eyes allows us to perceive depth, but it is not required to understand 3D geometry.